Deriving continous grounded meaning representations from referentially structured multimodal contexts
نویسندگان
چکیده
Corpora of referring expressions paired with their visual referents are a good source for learning word meanings directly grounded in visual representations. Here, we explore additional ways of extracting from them word representations linked to multi-modal context: through expressions that refer to the same object, and through expressions that refer to different objects in the same scene. We show that continuous meaning representations derived from these contexts capture complementary aspects of similarity, even if not outperforming textual embeddings trained on very large amounts of raw text when tested on standard similarity benchmarks. We propose a new task for evaluating grounded meaning representations—detection of potentially co-referential phrases—and show that it requires precise denotational representations of attribute meanings, which our method provides.
منابع مشابه
The Significance of Multimodality/Multiliteracies in Iranian EFL Learners’ Meaning- Making Process
The main objective of this study was to investigate how Iranian EFL learners used their literacy practices and multimodal resources to mediate interpretation and representation of an advertisement text and construct their understanding of it. Fifteen female adolescents at an intermediate level of proficiency read the "مبلمان برلیان" (“Brelian Furniture”) advertisement text and re-created their ...
متن کاملLearning Visual Attributes from Image and Text
Visual attributes are the words describing appearance properties of an object. For example, one might use gray or brown and furry to describe a cat. Visual attributes have been studied in the computer vision community [4], where the main focus has been in the automatic recognition of attributes from an image. Visual attributes have an interesting property in that they are linguistic entities an...
متن کاملPeer-Assessment and Student-Driven Negotiation of Meaning: Two Ingredients for Creating Social Presence in Online EFL Social Contexts
With the current availability of state-of-the-art technology, particularly the Internet, people have expanded their channels of communication. This has similarly led to many people utilizing technology to learn second/foreign languages. Nevertheless, many current computer-assisted language learning (CALL) programs still appear to be lacking in interactivity and what is termed social presence, w...
متن کاملSemiotic schemas: A framework for grounding language in action and perception
A theoretical framework for grounding language is introduced that provides a computational path from sensing and motor action to words and speech acts. The approach combines concepts from semiotics and schema theory to develop a holistic approach to linguistic meaning. Schemas serve as structured beliefs that are grounded in an agent’s physical environment through a causal-predictive cycle of a...
متن کاملGrounded spoken language acquisition: experiments in word learning
| Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-speci ed, and have meaning only when interpreted by humans. We are interested in developing computational syste...
متن کامل